Bibliography
183
[55] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
Bert: Pre-
training of deep bidirectional transformers for language understanding. In NAACL-
HLT, 2019.
[56] Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu.
Regularizing ac-
tivation distribution for training binarized deep networks.
In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11408–
11417, 2019.
[57] Ruizhou Ding, Zeye Liu, Rongye Shi, Diana Marculescu, and RD Blanton. Lightnn:
Filling the gap between conventional deep neural networks and binarized networks.
In Proceedings of the on Great Lakes Symposium on VLSI 2017, pages 35–40, 2017.
[58] Paul Adrien Maurice Dirac. The physical interpretation of the quantum dynamics.
Proceedings of the Royal Society of London. Series A, Containing Papers of a Math-
ematical and Physical Character, 113(765):621–641, 1927.
[59] Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W Mahoney, and
Kurt Keutzer. Hawq-v2: Hessian aware trace-weighted quantization of neural net-
works.
In Neural Information Processing Systems(NeurIPS), pages 18518–18529,
2020.
[60] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua
Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold,
Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recogni-
tion at scale. arXiv preprint arXiv:2010.11929, 2020.
[61] Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy,
and Dharmendra S Modha.
Learned step size quantization.
arXiv preprint
arXiv:1902.08153, 2019.
[62] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew
Zisserman. The pascal visual object classes (voc) challenge. International journal of
computer vision, 88(2):303–338, 2010.
[63] Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M Roy, and Ali
Ramezani-Kebrya. Adaptive gradient quantization for data-parallel sgd. Advances in
neural information processing systems, 33:3174–3185, 2020.
[64] Angela Fan, Edouard Grave, and Armand Joulin. Reducing transformer depth on
demand with structured dropout. arXiv preprint arXiv:1909.11556, 2019.
[65] Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, R´emi Gribonval, Herve
Jegou, and Armand Joulin. Training with quantization noise for extreme model com-
pression. arXiv preprint arXiv:2004.07320, 2020.
[66] Pedro Felzenszwalb and Ramin Zabih. Discrete optimization algorithms in computer
vision. Tutorial at IEEE International Conference on Computer Vision, 2007.
[67] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm.
In icml, volume 96, pages 148–156. Citeseer, 1996.
[68] D. Gabor.
Electrical engineers part iii: Radio and communication engineering, j.
Journal of the Institution of Electrical Engineers - Part III: Radio and Communication
Engineering 1945-1948, 1946.